34 research outputs found
Isoelastic Agents and Wealth Updates in Machine Learning Markets
Recently, prediction markets have shown considerable promise for developing
flexible mechanisms for machine learning. In this paper, agents with isoelastic
utilities are considered. It is shown that the costs associated with
homogeneous markets of agents with isoelastic utilities produce equilibrium
prices corresponding to alpha-mixtures, with a particular form of mixing
component relating to each agent's wealth. We also demonstrate that wealth
accumulation for logarithmic and other isoelastic agents (through payoffs on
prediction of training targets) can implement both Bayesian model updates and
mixture weight updates by imposing different market payoff structures. An
iterative algorithm is given for market equilibrium computation. We demonstrate
that inhomogeneous markets of agents with isoelastic utilities outperform state
of the art aggregate classifiers such as random forests, as well as single
classifiers (neural networks, decision trees) on a number of machine learning
benchmarks, and show that isoelastic combination methods are generally better
than their logarithmic counterparts.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Exploiting diversity for efficient machine learning
A common practice for solving machine learning problems is currently to consider
each problem in isolation, starting from scratch every time a new learning problem
is encountered or a new model is proposed. This is a perfectly feasible solution
when the problems are sufficiently easy or, if the problem is hard when a large
amount of resources, both in terms of the training data and computation, are
available. Although this naive approach has been the main focus of research in
machine learning for a few decades and had a lot of success, it becomes infeasible
if the problem is too hard in proportion to the available resources. When using
a complex model in this naive approach, it is necessary to collect large data
sets (if possible at all) to avoid overfitting and hence it is also necessary to use
large computational resources to handle the increased amount of data, first during
training to process a large data set and then also at test time to execute a complex
model.
An alternative to this strategy of treating each learning problem independently
is to leverage related data sets and computation encapsulated in previously
trained models. By doing that we can decrease the amount of data necessary to
reach a satisfactory level of performance and, consequently, improve the accuracy
achievable and decrease training time. Our attack on this problem is to exploit
diversity - in the structure of the data set, in the features learnt and in the
inductive biases of different neural network architectures.
In the setting of learning from multiple sources we introduce multiple-source
cross-validation, which gives an unbiased estimator of the test error when the data
set is composed of data coming from multiple sources and the data at test time are
coming from a new unseen source. We also propose new estimators of variance of
the standard k-fold cross-validation and multiple-source cross-validation, which
have lower bias than previously known ones.
To improve unsupervised learning we introduce scheduled denoising autoencoders,
which learn a more diverse set of features than the standard denoising
auto-encoder. This is thanks to their training procedure, which starts with a
high level of noise, when the network is learning coarse features and then the
noise is lowered gradually, which allows the network to learn some more local
features. A connection between this training procedure and curriculum learning
is also drawn. We develop further the idea of learning a diverse representation
by explicitly incorporating the goal of obtaining a diverse representation into the
training objective. The proposed model, the composite denoising autoencoder,
learns multiple subsets of features focused on modelling variations in the data set
at different levels of granularity.
Finally, we introduce the idea of model blending, a variant of model compression,
in which the two models, the teacher and the student, are both strong
models, but different in their inductive biases. As an example, we train convolutional
networks using the guidance of bidirectional long short-term memory
(LSTM) networks. This allows to train the convolutional neural network to be
more accurate than the LSTM network at no extra cost at test time
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
Multi-modal fusion approaches aim to integrate information from different
data sources. Unlike natural datasets, such as in audio-visual applications,
where samples consist of "paired" modalities, data in healthcare is often
collected asynchronously. Hence, requiring the presence of all modalities for a
given sample is not realistic for clinical tasks and significantly limits the
size of the dataset during training. In this paper, we propose MedFuse, a
conceptually simple yet promising LSTM-based fusion module that can accommodate
uni-modal as well as multi-modal input. We evaluate the fusion method and
introduce new benchmark results for in-hospital mortality prediction and
phenotype classification, using clinical time-series data in the MIMIC-IV
dataset and corresponding chest X-ray images in MIMIC-CXR. Compared to more
complex multi-modal fusion strategies, MedFuse provides a performance
improvement by a large margin on the fully paired test set. It also remains
robust across the partially paired test set containing samples with missing
chest X-ray images. We release our code for reproducibility and to enable the
evaluation of competing models in the future
Scheduled Denoising Autoencoders
We present a representation learning method that learns features at multiple dif-ferent levels of scale. Working within the unsupervised framework of denoising autoencoders, we observe that when the input is heavily corrupted during train-ing, the network tends to learn coarse-grained features, whereas when the input is only slightly corrupted during training, the network tends to learn fine-grained features. This motivates the scheduled denoising autoencoder, which starts with a high level of input noise that lowers as training progresses. We find that the result-ing representation yields a significant boost on a later supervised task compared to the original input, or to a standard denoising autoencoder trained at a single noise level.
Multiple-source cross-validation
Cross-validation is an essential tool in machine learning and statistics. The typical procedure, in which data points are randomly assigned to one of the test sets, makes an implicit assumption that the data are exchangeable. A common case in which this does not hold is when the data come from multiple sources, in the sense used in transfer learning. In this case it is common to arrange the cross-validation procedure in a way that takes the source structure into account. Although common in practice, this procedure does not appear to have been theoretically analysed. We present new estimators of the variance of the cross-validation, both in the multiple-source setting and in the standard iid setting. These new estimators allow for much more accurate confidence intervals and hypothesis tests to compare algorithms. 1
Breast density classification with deep convolutional neural networks
Breast density classification is an essential part of breast cancer
screening. Although a lot of prior work considered this problem as a task for
learning algorithms, to our knowledge, all of them used small and not
clinically realistic data both for training and evaluation of their models. In
this work, we explore the limits of this task with a data set coming from over
200,000 breast cancer screening exams. We use this data to train and evaluate a
strong convolutional neural network classifier. In a reader study, we find that
our model can perform this task comparably to a human expert